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1. INTRODUCTION 

Coronavirus desease-19 (COVID-19), which was originally found in Wuhan, China, is a virus-borne 
disease caused by the severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) viral that causes acute 
respiratory infections in humans. It has spread to several nations globally since December 12, 2019 [1]-[3]. 
The illness has spread to at least 221 countries, with over 130.3 million cases reported and almost 2.8 million 
deaths. COVID-19 individuals infected with SARS-CoV-2 exhibit typical flu symptoms, including fatigue, 
fever, dry cough, runny nose, sore throat, and body aches. COVID-19's fast spread prompted the World 
Health Organization (WHO) to proclaim it a "pandemic" on March 11, 2020 [4]. At the present, the WHO 
advises that persons using face masks do so if they have respiratory symptoms or are caring for someone who 
does. Additionally, several public service providers restrict clients from using their services unless they are 
wearing masks. As a result, face mask identification has become a critical area of research for assisting the 
worldwide community, yet research in this area is restricted [5], [6]. Previous research has shown that 
wearing a facemask is beneficial in avoiding the transmission of respiratory infections. For example, N95 and 
surgical masks are 91% and 68% effective in preventing the spread of SARS, respectively [7]. Although 
much research has been conducted on the use of covered face identification algorithms in ATM surveillance, 
many of these studies failed to consider the acquired ATM surveillance scene (camera view and camera—user 
distance), as well as face-covering accessories [8], [9]. 
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This paper determines if a mask may be worn by using the NVIDIA Jetson development Kit series 
and a webcam. Additionally, the surface temperature of the human body may be monitored concurrently by 
attaching an infrared (IR) array sensor device named AMG8833. Following that, if the human body 
temperature exceeds the threshold value, a buzzer alert will sound to notify. The operating environment may 
be run in a docker container that includes TensorFlow2, Keras, and the MobileNet model. Since COVID-19, 
the significance of mask use and temperature monitoring has grown globally. However, the present situation 
is that specialized camera equipment is prohibitively costly and does not reach those in need. This work aims 
to contribute to the democratization of artificial intelligence by offering solutions that use low-cost, high- 
performance artificial intelligence (AI) edge devices such as the NVIDIA Jetson Nano. The rest of the article 
is structured is being as: the second section discusses previous studies. In section 3, describe the methodology 
and the recommended strategy fully. Section 4 analyzed the observed measurements. Finally, section 5, 
concludes with a discussion. 


2. RELATED WORKS 

This section discusses several past approaches for face mask identification systems that make use of 
convolutional neural networks (CNNs). Research by Sethi et al. [10] three commonly used baseline models, 
deep residual networks-50 (ResNet50), Alexnet, and MobileNet were employed in the experiment. When 
integrating these models with the proposed model, they examined the potential of obtaining highly accurate 
results with a decreased inference time. With ResNet50, it was found that the recommended technique had a 
high degree of precision (98.2%). It also has accuracy and recall of 11.07 and 6.44%, respectively, compared 
to a previously published public baseline model known as RetinaFaceMask detector. For video surveillance 
systems, the model proposed was a good choice. Research by Jagadeeswari and Theja [11] individuals who 
don't use masks were highlighted by using learning algorithms. To ensure that the System was able to 
correctly identify if a person was wearing a mask, it was trained. An alert should be triggered if the algorithm 
recognizes a person without a mask, so that the community or the necessary authorities may be informed and 
take appropriate action. Classifiers with different optimizers must be considered to build an efficient system 
that can be used on large scales. ResNet50, VGG16, and MobileNetV2 were compared against adaptive 
moment estimation (ADAM), adaptive gradient algorithm (ADAGRAD), and stochastic gradient descent 
(SGD) as optimizers. Oumina et al. [12] evaluated the use of several deep CNNs to extract deep 
characteristics from photos of faces. The collected characteristics were processed using a variety of machine 
learning classifiers, including the support vector machine (SVM) and the K-nearest neighbors (K-NN) were 
utilized and studied to assess the performance of all models using a variety of different measures, such as 
accuracy and precision. The best classification rate obtained was 97.1%, which was attained with the 
combination of SVM and the MobileNetV2 model. Das et al. [13] offered a simplified technique that makes 
use of certain fundamental machine learning libraries such as TensorFlow, Keras, OpenCV, and scikit-learn. 
The suggested approach properly spotted the face in the picture and determined whether it wore a mask. 
Additionally, as a surveillance job performer, that could detect a face and a mask in motion. On two distinct 
datasets, the approach achieved an accuracy of 95.77% and 94.58%, respectively. Research by Rao et al. [14] 
the idea of a facial recognition system that could be used to identify a person who was wearing a mask by 
detecting others who weren't wearing masks was put forward. The fine amount was delivered to that person's 
cell phone and address once that data was merged with a public identity database. A CNN model was used to 
distinguish between those wearing masks and those who were not. To build the model, they employed two 
convolutional layers each with 100 filters, dropped out 0.5%, and activated the hidden and fully connected 
layers, respectively, using Relu and SoftMax activation functions. As the loss function, cross-entropy was 
utilized; the model's optimization was done using Adam. A cascade classifier was used to categorize faces 
based on approximately 1500 photos, both with and without masks. It had a precision of 91.21%. Research 
by Aljumah [15] decision tables, neural networks, SVMs, oneRs, K-NNs, dense neural networks (DNNs), 
and long short-term memory (LSTM) were utilized to detect coronavirus cases from time-sensitive data. 
Simulated COVID-19 data was used to test the eight algorithms after selecting the suitable symptoms. 
According to the statistics, five of these eight algorithms had a success rate of more than 90%. Suresh et al. 
[16] addressed facemask detection and notification more thoroughly. The suggested system/model was 
trained and evaluated using Kaggle datasets. The system was performed in real-time and detected if an 
individual's face was covered by a facemask. If not, the individual was notified individually through text 
message. The mask was created using real-time public faces and placed into a CNN as an input. Vinitha and 
Velantina [17] utilized a live camera feed and generated an alarm sound (buzzer) when someone was not 
wearing a mask. The objective was to determine whether the individual in an image/video stream is wearing 
a face mask using computer vision and deep learning. Sen and Sawant [18] proposed a mask detection system 
capable of detecting any form of mask and masks of varying shapes in video streams to comply with the 
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government's standards. For mask identification from images/video streams, a deep learning algorithm was 
utilized, and the Python PyTorch package was used. The suggested system was capable of distinguishing 
between those who wore masks and those who did not. Mundial et al. [19] used a combination of supervised 
learning and DNN-based facial traits to recognize masked faces. A dataset of masked faces was used to train 
the SVM classifier on the state-of-the-art facial recognition feature vector. The recommended technique 
yields up to 97% accuracy when using masked faces. Research by Nagrath et al. [20] the single shot multibox 
detector was utilized as a face detector, and the MobilenetV2 architecture served as a framework for the 
classifier, which was very lightweight and could be used in embedded devices for real-time mask 
identification. Accuracy of 0.9264 and an F1 of 0.93. 


3. METHOD 

Since COVID-19, the significance of mask use and temperature monitoring has grown globally. 
However, the present situation is that specialized camera equipment is prohibitively costly and does not reach 
those in need. The system aims to contribute to the democratization of artificial intelligence by offering 
solutions that use low-cost, high-performance AI edge devices such as the NVIDIA Jetson Nano. In this 
paper, the surveillance system of mask detection through the pre-trained CNNs model provided with 
AMG8833 sensor and a buzzer to alert are investigated. 


3.1. MobileNetV2 

Face mask recognition in this study is accomplished via the use of MobileNetV2, a machine learning 
technique, rather than the visual classifier. Improved computational speed and efficiency are used in this 
model [21]. In both high and low-computing environments, it may be used. A new version of MobileNetV2 
builds on the principles of the first version [22]. A two-tiered structure underpins the MobileNetV1 network. 
This is known as depthwise convolution, and each input port receives a convolution filter for light filtering. 
Layers are convolutions of 1x1 known as pointwise convolutions, which employ linear combinations of input 
channels to build new feature sets. ReLU6 serves as the benchmark in this case. As a result of its excellent 
statistical properties when utilized with low-precision computing, ReLU6 is often deployed. It is possible to 
categorize blocks in MobileNetV2 into two kinds [23]. Residual blocks have a stride of one and the initial 
block is one of these. A declining block must have a two-step stride to be successful. There are two types of 
blocks in the stratum with a 1x1 convolution pool, the initial step is to activate the ReLU. A deeper 
convolution layer is the second layer, as previously stated. Third layer convolution is done again, but no non- 
linearity is introduced to this convolution. When ReLU is used, deeper networks have the capacity to 
classifiers based on non-zero outputs generated. The MobileNetV2 network has a single convolution layer 
and 19 bottlenecks [24]. Figure 1 is a MobileNet illustration. 
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Figure 1. MobileNet architecture 


3.2. Dataset 
Gathering data is the first step in constructing a face-mask recognition model. Mask wearers and 
non-mask wearers are included in the dataset, which is derived from a mask-related dataset [25], [26]. Photos 
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of data masked combined with unmasked photographs were used to create this dataset, which includes 2,165 
images and 1,930 images. The item's face is all that is visible in this cropped image. To separate the data that 
has a mask from the rest, it must first be labeled. It is divided into two distinct categories once the data has 
been tagged. The pre-processing of data is necessary before training and testing can begin. Downsizing the 
image, turning the image into an array, using MobileNetV2 to preprocess the input, and finally performing 
hot encoding on the labels are the four processes of pre-processing. In computer vision, pre-processing such 
as scaling is required since training models are so powerful. In many cases, models perform better if their 
pictures are decreased in size. This research found that the images tested were 224x224 pixels. To utilize the 
loop function, the data is first transformed into an array. The MobileNet model will be used for pre- 
processing. To complete this step, labeled data must be performed through hot encoding since learning 
algorithms cannot handle labeled data directly. To put it another way, instead of a numerical input and 
output, any variable is required. Because of this, the technique is also assigned an arbitrary numerical value. 
Training data accounts for 75% of the entire data while testing data accounts for the remaining 25%. It 
includes all the masks in this collection. Some masks, on the other hand, and aren't. 


3.3. Building and testing the model 

Training image generator, basic model using MobileNetV2, model parameter addition, compilation, 
and model training on MacBook M1 processor with Jetson Nano Kit are all included in this work. Model 
storage for future predictions on the Jetson Nano Kit is also included on a new Apple M1 computer, 16 GB 
of random acces memory (RAM) and an 8-core central processing unit (CPU) dan graphics processing unit 
(GPU) are used for testing. The Python 3.8 [27] is used to conduct several experiments. Mathematical 
formulas for evaluating the MobileNetV2 model are presented in the following (1)-(4) based on [28]. 


ge EN 
Accuracy = [TP+FP-+TN+FN] j 
.. [TP] 
Precision = [TP+FP] = 
EE TE i 
CCU = TP4FN] i 
F1 — Score = 2 x Erein x Recall} > 


[ Precision + Recall | 


These abbreviations FN symbolizes false negative, TP symbolizes true positive, FP symbolizes false 
positive, and TN symbolizes true negative [29]. True positive values in the previous equations refer to 
pictures that have been labeled as true and produced a true result as predicted by the model. Similarly, true 
negative pictures are those that have been categorized as true but generated an incorrect outcome because of 
prediction. False-positive images have been categorized as false yet produced false positives because of 
prediction. False-negative images are those that are categorized as false yet turn out to be accurate, resulting, 
and in false negatives. Precision is a metric that indicates the number of expected positive values. The recall 
statistic quantified a classifier's ability to identify all positive cases, while the Fl-score quantified test 
accuracy. These evaluation measures have been selected because they provide the most accurate findings 
through a balanced dataset. Model testing is divided into stages to verify that it can make accurate 
predictions. Predictions are made about the testing set's first stage. 


3.4. Hardware components 
3.4.1. Jetson Nano and secure digital card 

A new NVIDIA Kit makes it possible to run modern AI workloads at previously unheard-of 
scalability, power, and price [30]. Figure 2 illustrates as AI frameworks and models may now be used by 
developers, educators, and makers to construct applications for classification tasks, object recognition, 
categorization, and speech processing, among other things. General-purpose input/output (GPIO) to camera 
serial interface (CSI) I/Os are all included in this Kit, which is powered by micro-universal serial bus (micro- 
USB). This makes it easier to link a wide variety of new detectors to allow for a variety of AI applications. It 
consumes just 5 watts of power, making it very energy efficient [31]. SD, or secure digital, is a detachable 
memory card format that is used to read and write large quantities of data in a variety of mobile gadgets, 
cameras, and smart devices. 
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Figure 2. Jetson Nano 


3.4.2. AMG8833 sensor and C920e webcam 

As shown in Figure 3 Panasonic developed the AMG8833 64-pixel temperature sensor as part of 
their grid-EYE product line. The sensor is composed of an 8x8 array of infrared thermopiles that detect the 
infrared radiation emitted by emissive bodies to determine the temperature. Grid-EYE communicates with 
Jetson Nano through the inter-integrated circuit (I?C) interface. The AMG8833 has a lens that limits the 
viewing angle of the sensor to 60 degrees, resulting in a sensing area appropriate for objects in the mid-field 
(as opposed to far-field or near-field). Additionally, it operates between 3.3 and 5 volts, with a sample rate of 
1 Hz — 10 Hz and temperature accuracy of about 0.25°C throughout the temperature range of 0°C to 80°C. 
The AMG8833 is ideally suited for non-contact temperature measurement applications such as thermal 
imaging, heat transfer analysis, human temperature monitoring, heating, and air conditioning management, 
and industrial control. C920e with numerous resolutions, including 1080 p (Full HD) at 30 frames per second 
and 720 p (HD) at 30 frames per second, are supported. 


Figure 3. AMG8833 


3.4.3. Power supply, screen, HDMI cable, breadboard, wired jumper cables, relay, and buzzer with 
battery 9 V 

The power supply is 110 V — 220 V alternating current (AC) input and 5 V direct current (DC) 
output up to 4 A. The screen connects via high-definition multimedia interface (HDMI) cable to the Jetson 
Nano. The jumper wire is ideal for connecting pins of a Nano Kit to the 830 solderless tie-point prototype 
breadboard. Buzzer with battery is wired to JQC-3FF-S-Z 5 V relay from tongling factory. 


3.5. The final setup 

Data for mask detection is first loaded into the model's dataset. Data preparation involves the usage 
of dynamically loaded (DL) libraries. The MobileNetV2 classifier is trained using TensorFlow, Keras, and 
OpenCV. A MacBook M1 with fast CPU and GPU capabilities is used to train and test a model using the 
mask detection approach, resulting in an accuracy of 99% during training and 100% during testing. A low- 
cost development Kit called the Jetson Nano is used to implement the model learned on the Jetson Nano. A 
Nano Kit and a logitech USB camera C920e are used to capture real-time video. After developing a face 
mask classification model, faces may be recovered from images and video streams as needed. In certain 
cases, mask detection can tell whether a person has been wearing a mask at all. An additional sensor, the 
AMG8833, is used to measure body temperature, and a buzzer sounds if a person's body temperature exceeds 
a certain threshold. As depicted in Figures 4(a) and (b), the suggested system's flowchart. 
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Figure 4. Flow chart of proposed system (a) training and testing the model and (b) implementing the model 


4. EXPERIMENTS AND RESULTS 

The system determines if a mask may be worn by using the NVIDIA Jetson Development Kit series 
and a Webcam. Additionally, the surface temperature of the human body may be monitored concurrently by 
attaching an IR array sensor device named AMG8833. Following that, if the human body temperature 
exceeds the threshold value has been set to 37°C, a buzzer alert will sound to notify. The operating 
environment may be run in a docker container that includes TensorFlow2, Keras, and the MobileNet model. 


4.1. Training the model 

The model's loss and accuracy were evaluated twenty times during training on the MacBook M1 
chip, as shown in Figure 5. With the start of the second phase, accuracy increases while loss reduces, as seen 
in Figure 5. The model's accuracy may be improved without more iterations if the accuracy stays stable. 
Table 1 shows the results of the evaluation of the model in the next stage. Macro average (MA) function 
calculates F1 for each label and delivers the average without taking the fraction of each label in the dataset 
into consideration. The weighted average (WA) function takes into consideration the fraction of each label in 
the dataset while calculating F1 for each label. When applied to a MacBook's M1 chip, a simulation is 
depicted in Figure 6. 


— Loss Accuracy — Val Loss ~-Val Accuracy 
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Figure 5. Training model 
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Table 1. Evaluation of the model 
Support Recall _Fl-score Precision 


Mask 433 100% 99% 99% 

No mask 386 98% 99% 100% 

Accuracy 819 99% 

MA 819 99% 99% 99% 

WA 819 99% 99% 99% 
“see Output 


Figure 6. Predicting on MacBook M1 


4.2. Implementing the model 

After training and testing the model on a MacBook M1, the model is implemented on the Jetson 
Nano, and the pins of a Nano Kit are connected to the AMG8833 sensor. The following diagram illustrates 
the relationship between AMG8833, relay, buzzer, battery, and Jetson. Figure 7 depicts the final setup. The 
results of the surveillance proposed system on the NVIDIA Jetson Nano GPU development board are shown 
in Figure 8. 


IN Pint2 From Jetson Nano 
VCC 5v Pin? From Jetson Nano 
GND Pind From Jetson Nano 


+ with NO From Relay 
-with Black Wire From Buzzer 


P 


To JQC-3FF-S-Z 5V Relay | | [To BATTERY 


al Red Wire with COM From Relay 
Black Wire with - From Battery 


To BUZZER 


AMG8833 (GND) Pin6 From Jetson Nano 

A £ AMG8833 (Vin) 3.3v Pin1 From Jetson Nano 
A AMG8833 (SDA) Pin3 From Jetson Nano 
£o AMG8833 (SCL) Pin5 From Jetson Nano 


To AMG8833 sensor 


To C920e USB Came: 


Jetson Nano 


Figure 7. The final setup 
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Figure 8. Surveillance proposed system on Jetson Nano 
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5. CONCLUSION 

The COVID-19 disease can no longer spread due to the NVIDIA Jetson Nano and the MobileNetV2 
lightweight model. As a rule, face-mask detection should be used in high-traffic areas, such as retail malls, 
public transit stations, and office buildings. When the system is installed in any area, it may be configured to 
gather either an on-the-fly live stream or an archived video feed. These kinds of real-time detection models 
might be quite useful in surveillance systems with edge applications that can recognize little elements like 
masks, masked faces, human temperature, and buzzer sounds when they are over a specific threshold of 
temperature testing indicated that the MacBook M1 has a 99% accuracy rate for training the model and a 
100% accuracy rate for testing. 
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